Inducing Information Extraction Systems for New Languages via Cross-language Projection
نویسندگان
چکیده
Information extraction (IE) systems are costly to build because they require development texts, parsing tools, and specialized dictionaries for each application domain and each natural language that needs to be processed. We present a novel method for rapidly creating IE systems for new languages by exploiting existing IE systems via crosslanguage projection. Given an IE system for a source language (e.g., English), we can transfer its annotations to corresponding texts in a target language (e.g., French) and learn information extraction rules for the new language automatically. In this paper, we explore several ways of realizing both the transfer and learning processes using off-theshelf machine translation systems, induced word alignment, attribute projection, and transformationbased learning. We present a variety of experiments that show how an English IE system for a plane crash domain can be leveraged to automatically create a French IE system for the same domain.
منابع مشابه
Multilingual Open Relation Extraction Using Cross-lingual Projection
Open domain relation extraction systems identify relation and argument phrases in a sentence without relying on any underlying schema. However, current state-of-the-art relation extraction systems are available only for English because of their heavy reliance on linguistic tools such as part-of-speech taggers and dependency parsers. We present a cross-lingual annotation projection method for la...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملA Cross-lingual Annotation Projection-based Self-supervision Approach for Open Information Extraction
Open information extraction (IE) is a weakly supervised IE paradigm that aims to extract relation-independent information from large-scale natural language documents without significant annotation efforts. A key challenge for Open IE is to achieve self-supervision, in which the training examples are automatically obtained. Although the feasibility of Open IE systems has been demonstrated for En...
متن کاملBootstrapping Term Extractors for Multiple Languages
Terminology extraction resources are needed for a wide range of human language technology applications, including knowledge management, information extraction, semantic search, cross-language information retrieval and automatic and assisted translation. We report a low cost method for creating terminology extraction resources for 21 non-English EU languages. Using parallel corpora and a project...
متن کاملExperiments in Cross Language Query Focused Multi-Document Summarization
The twin challenges of massive information overload via the web and ubiquitous computers present us with an unavoidable task: developing techniques to handle multilingual information robustly and efficiently, with as high quality performance as possible. Previous research activities on multilingual information access systems have studied cross-language information retrieval (CLIR), information ...
متن کامل